ci(release): gate python wheels on e2e for tagged releases by drew · Pull Request #319 · NVIDIA/OpenShell

drew · 2026-03-15T06:01:27Z

Summary

Ensure all tagged release artifacts are gated by e2e tests, and remove unnecessary e2e gating from dev release Docker images.

Changes

release-tag.yml: Add e2e to publish-python needs so Python wheels are not published to Artifactory until e2e passes
release-dev.yml: Remove e2e from tag-ghcr-dev needs since dev Docker images don't need to wait for e2e
policy-advisor example: Replace gitlab-master.nvidia.com references with internal.corp.example.com

Updated gate table

Artifact	Tagged Release	Dev Release
Docker images (GHCR)	Gated by e2e	Not gated by e2e
Python wheels (S3/Artifactory)	Gated by e2e	Not gated by e2e
GitHub Release (CLI + wheels)	Gated by e2e (transitively)	Not gated by e2e

Testing

mise run pre-commit passes (YAML-only changes, no code)
Unit tests added/updated
E2E tests added/updated (if applicable)

Checklist

Follows Conventional Commits
Commits are signed off (DCO)
Architecture docs updated (if applicable)

- Add e2e to publish-python needs in release-tag.yml so wheels are not published to Artifactory until e2e passes - Remove e2e gate from tag-ghcr-dev in release-dev.yml since dev Docker images do not need to wait for e2e - Replace gitlab-master.nvidia.com references with generic example host in policy-advisor CTF example

…mand test Switch the release canary from Docker-outside-of-Docker (host socket mount) to true Docker-in-Docker. The CI container now starts its own dockerd, so the gateway cluster container is a child process and 127.0.0.1 port bindings are reachable directly. This enables testing the real zero-to-sandbox user path: a single `openshell sandbox create` that auto-bootstraps the gateway, pulls the cluster image, and creates a sandbox — no --gateway-host workaround. Dockerfile.ci changes: - Add iptables (required by dockerd for container networking) - Extract full Docker daemon suite (dockerd, containerd, runc) instead of CLI only release-canary.yml changes: - Remove /var/run/docker.sock volume mount - Add dockerd startup step - Remove gateway host resolution and explicit gateway start steps - Simplify canary to single auto-bootstrap sandbox create command

The first canary run revealed two issues: 1. dockerd failed to start because docker-proxy was not extracted from the Docker static binary tarball. Add it to the extraction list. 2. The GitHub Actions runner injects its own Docker socket into job containers. Without an explicit DOCKER_HOST, the openshell CLI connected to the runner's host Docker daemon instead of our DinD daemon. Start dockerd on a dedicated socket (/var/run/dind.sock) and export DOCKER_HOST so all subsequent steps use it.

Using a custom socket path and DOCKER_HOST breaks the GitHub Actions runner's internal Docker operations (it uses docker exec to run steps inside the container). Since we removed the host socket volume mount, /var/run/docker.sock is free inside the container — just start dockerd on the default path with no DOCKER_HOST override needed.

The GHA runner injects its own /var/run/docker.sock into the container for management, so dockerd can't bind to the default path. Use a dedicated socket (/var/run/dind.sock) and set DOCKER_HOST only on steps that need it (via step-level env) to avoid breaking the runner.

Each GHA step runs via docker exec which sends SIGHUP to backgrounded processes when the shell exits. Use nohup to detach dockerd from the step's process group so it persists across steps.

setsid creates a new session and process group, ensuring dockerd survives when the GHA runner's docker-exec shell exits between steps.

Background processes started via docker-exec don't persist across GHA steps — each step gets a fresh docker-exec invocation. Move dockerd startup into the canary test step itself so it shares the same shell session and stays alive for the duration of the test.

The GHA container uses overlayfs, and the inner dockerd also defaults to overlayfs. Overlay can't be stacked, causing container creation to fail. Use --storage-driver=vfs which copies layers instead of layering them — slower but reliable for DinD.

Add OPENSHELL_GATEWAY_HOST environment variable support to the sandbox create auto-bootstrap path. This mirrors the --gateway-host flag on `gateway start` but works for the implicit bootstrap triggered by `sandbox create` when no gateway exists. In CI containers using Docker-outside-of-Docker (host socket mount), 127.0.0.1 inside the CI container doesn't reach sibling gateway containers. Setting OPENSHELL_GATEWAY_HOST=host.docker.internal fixes this without requiring the two-step gateway-start-then-sandbox-create workflow. Update release canary to use the single-command path: just `openshell sandbox create` which auto-bootstraps everything. For workflow_dispatch (branch testing), builds CLI from source to test the current branch code. For workflow_run (release testing), installs the published binary.

Use the explicit --gateway-host flag on gateway start (works with current published CLI) while also setting OPENSHELL_GATEWAY_HOST env var (will be picked up once the next release ships with env var support). Once the env var support is released, the canary can switch to the single-command sandbox create path.

The canary uses DooD (host socket mount), not DinD, so the dockerd, containerd, runc, docker-proxy, and iptables additions are unnecessary.

The gateway host override is useful in any environment where the client can't reach the Docker host at 127.0.0.1 — CI containers, WSL, remote Docker hosts, etc. Update the CLI help text, DeployOptions doc comment, and bootstrap env var comment to reflect this.

drew added 2 commits March 14, 2026 22:58

wip

0033d97

drew self-assigned this Mar 15, 2026

drew added 12 commits March 14, 2026 23:25

fix(ci): use nohup for dockerd to survive between GHA steps

45f3fd3

Each GHA step runs via docker exec which sends SIGHUP to backgrounded processes when the shell exits. Use nohup to detach dockerd from the step's process group so it persists across steps.

fix(ci): use setsid to fully detach dockerd from GHA step shell

291f48f

setsid creates a new session and process group, ensuring dockerd survives when the GHA runner's docker-exec shell exits between steps.

revert: remove DinD additions from Dockerfile.ci

30792c3

The canary uses DooD (host socket mount), not DinD, so the dockerd, containerd, runc, docker-proxy, and iptables additions are unnecessary.

drew merged commit aa2e271 into main Mar 15, 2026
9 checks passed

drew deleted the more-ci-updates-2 branch March 15, 2026 07:39

drew added a commit that referenced this pull request Mar 16, 2026

ci(release): gate python wheels on e2e for tagged releases (#319)

0a7ffe1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci(release): gate python wheels on e2e for tagged releases#319

ci(release): gate python wheels on e2e for tagged releases#319
drew merged 14 commits intomainfrom
more-ci-updates-2

drew commented Mar 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

drew commented Mar 15, 2026

Summary

Changes

Updated gate table

Testing

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant